Feature: sonic analysis provider#3516
Closed
chrisuthe wants to merge 6 commits intomusic-assistant:devfrom
Closed
Conversation
Contributor
🔒 Dependency Security Report✅ No dependency changes detected in this PR. |
MacgyverH
reviewed
Mar 31, 2026
481723c to
043220e
Compare
729ed8a to
3d93152
Compare
b04676e to
fa65f09
Compare
… base class Provider extracts audio features from PCM streams using librosa and stores them as semantic AudioAnalysisData fields (BPM, key, mode, energy, danceability, brightness, harmonic_complexity, roughness, rhythmic_regularity, loudness, beats, duration, true_peak, wave_form). Adapted to upstream AudioAnalysisProvider API: - _start_analysis returns bool (replaces old start_analysis override) - Uses streamdetails (not stream_details) - Stores via mass.streams.audio_analysis.set_audio_analysis()
Empty frequency sets and flat chroma profiles produce harmless warnings during key detection. Now suppressed with targeted warning filters and NaN handling for zero-std correlations.
fa65f09 to
7f5d82d
Compare
…scan Override the AudioAnalysisProvider.analyze_file hook so upstream's AudioAnalysisController._run_background_scan can drive backfill through the generic provider-agnostic interface. Loads audio via librosa, runs block feature extraction and collapse, populates duration and true_peak.
Three cleanups in one commit:
1. Stop computing overlap fields in librosa:
bpm <- overlaid by smart_fades (beat_this CNN)
key, mode <- overlaid by smart_fades (S-KEY)
danceability <- overlaid by clap_analysis (zero-shot, calibrated)
These were quality-inferior to their overlay sources and the overlay
system guaranteed replacement at vector-assembly time. Computing them
in librosa was wasted work; leaving their AudioAnalysisData fields
None makes the architecture honest. Install must have the relevant
overlay providers enabled or vectors won't assemble — the "no valid
signatures found" diagnostic added in the previous commit tells the
user exactly which fields are missing when that happens.
2. Remove dead-code feature extractions:
librosa.feature.mfcc
librosa.feature.tonnetz
librosa.feature.spectral_rolloff
librosa.feature.zero_crossing_rate
These were extracted per block and stored on BlockFeatures but never
read by collapse_to_analysis. Legacy from an earlier vector schema.
Removing saves roughly 100ms per 10s block of analyzed audio — for
a typical 3-min track, ~1.8s less CPU per analysis.
3. Fix pre-existing stale field names in test_helpers.py:
rms_energy_per_second -> rms_energy
spectral_centroid_per_second -> spectral_centroid
These referenced the pre-upstream-alignment field names and had been
silently failing 2 tests since the AudioAnalysisData model was updated.
Net: -157/+69 lines in helpers.py, test surface shrunk to match.
All 102 sonic_analysis + sonic_similarity tests pass.
extract_block_features previously called four librosa feature functions that each computed their own STFT internally — four redundant spectrograms per 10s block. All four (chroma_stft, spectral_contrast, spectral_centroid, spectral_flatness) share the same default n_fft=2048 / hop_length=512, so a single up-front STFT is the correct input for all of them via librosa's `S=` kwarg. Verified byte-identical output to the old per-feature path (max abs diff = 0 on all four feature matrices). All 10 sonic_analysis tests pass unchanged. Measured: 1.56x speedup on a 10s block (25ms -> 16ms). For a 3-min track (18 blocks), that's ~180ms saved per analyzed track. At a user's 12k-track library scale, ~36 minutes of CPU time per full background scan. Millions- of-tracks libraries benefit proportionally. RMS and onset_strength are left unchanged: RMS is time-domain, and onset_strength uses a mel spectrogram with different parameters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sonic Analysis Provider
This PR adds a sonic analysis provider that extracts semantic audio features from PCM audio during playback using librosa and stores them as standard
AudioAnalysisDatafields.What It Does
bpmkey/modeenergydanceabilityloudness_integrated/loudness_rangebrightnessharmonic_complexityroughnessrhythmic_regularityrms_energy_per_second/spectral_centroid_per_secondset_audio_analysis()as a plainAudioAnalysisData— no opaque blobs, no custom subclassesbrightness,harmonic_complexity,roughness,rhythmic_regularityArchitecture
The provider is a pure feature extraction + distillation layer. It does not store similarity vectors or compute distances — that responsibility belongs to the similarity plugin (separate stacked PR).
Both the provider and the similarity plugin depend on the shared
AudioAnalysisDatamodel contract. Any audio analysis provider that populates the same fields can feed the similarity plugin.Code Organization
helpers.py— Pure feature extraction (extract_block_features,merge_block_features) and semantic derivation (collapse_to_analysiswith private_derive_*helpers)__init__.py— MA integration: PCM streaming, block accumulation, session management,_finalize()stores the resultTesting
test_helpers.py— Tests for block extraction, merging, andcollapse_to_analysis(scalar ranges, determinism, noise vs sine differentiation)test_provider_units.py— Tests for PCM byte conversion (16/24/32-bit, mono/stereo)Dependencies
audio_analysis_controller_providerbranch (PR Add Audio Analysis controller and Audio Analysis provider #3509)